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ABSTRACT 

We provide a brief overview of the Galaxy Zoo and Zooniverse projects, including a short discussion of the 
history of, and motivation for, these projects as well as reviewing the science these innovative internet-based 
citizen science projects have produced so far. We briefly describe the method of applying en-masse human 
pattern recognition capabilities to complex data in data-intensive research. We also provide a discussion of 
the lessons learned from developing and running these community-based projects including thoughts on future 
applications of this methodology. This review is intended to give the reader a quick and simple introduction to 
the Zooniverse. 

Subject headings: astronomical databases, methods: data analysis, galaxies: general, galaxies: spiral, galaxies: 
elliptical and lenticular, galaxies: statistics 



1 . A BRIEF fflSTORY OF GALAXY MORPHOLOGY 

One of the fundamental facts of the Universe is that most 
large galaxie^come in two basic shapes which astronomers 
call "Spirals" and "Ellipticals". The exact details of why this 
is the case, and how the two types of galaxies relate to each 
other, remains a major mystery for astronomers. It is cen- 
tral to our understanding of how the creation and evolution of 
galaxies proceeds with cosmic time and depends on their cos- 
mic location. Significant effort has been spent over the last 
few decades trying to address these questions. 

Edwin Hubble was one of the first astronomers to attempt to 
systematically address the origin of the shape, or morphology, 
of galaxies usi ng his famous "Hubble Sequence" or "tuning 
fork" diagram ( Hubble|1926[ ) which is still in use today (see 
Figur^. Starting on the left, Hubble classified the elliptical 
galaxies using the observed ellipticity of the galaxy projected 
on the sky, giving them a numerical value associated with how 
round they appeared on the sky. In three dimensions, ellipti- 
cals can be triaxial objects, taking a range of morphologies 
from purely spherical systems through to flattened rugby ball 
shaped galaxies. 

On the right side of the tuning-fork, Hubble placed spiral 
or disk galaxies. These galaxies have a central "bulge" of 
stars, that resemble elliptical galaxies in some ways, embed- 
ded in a thin disk of stars that show a range of spiral patterns 
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or "arms". Hubble ordered disk galaxies based on the tight- 
ness of these spiral arms and the size of the central bulge. He 
had two distinct populations of disk galaxies, namely with 
and without a central bar-like (or linear) structure. At the 
point where these different classifications met (for spirals with 
the largest bulges, and tightest wound arms), Hubble placed 
"lenticular" galaxies which at the time were hypothetical disk 
galaxies with very large bulges and no spiral arms - they have 
since been found. 

It is a common misconception that Hubble believed the 
"tuning fork" diagram was an evolutionary sequence, with el- 
liptical galaxies on the left evolving along the sequence to 
form disk galaxies. In fact Hubble advised that "temporal 
connotations are made at one's peril" in an early defence of 
the classification sequence (Hubble 1927), going on to say 
that he set up the classification "without prejudice to theo- 
ries of [galaxy] evolution". This misconception about Hub- 
ble's beliefs probably arose due to his suggestion of the use of 
"early" and "late" types to describe the progression towards 
the right along the sequence (although he discussed that this 
nomenclature was simply for convenience and borro wed ter- 
minol ogy commonly used for stellar classification, ( [Hubble | 
|1926| l). Astronomers still call elliptical galaxies "early types 
and disk galaxies "late" type galaxies, although we now know 
that most "late" type galaxies have much younger stellar pop- 
ulations (ironically more "early type" stars) than most "early" 
type galaxies. 

Since Hubble, there have been several updates to his clas- 
sification scheme (for a recent review see |Buta|2011| l but key 
features have remained unchanged. What has changed dra- 
matically is the number of galaxies catalogued and requiring 
classification. Before the advent of digital detectors in astron- 
omy, astronomers could just visually classify the galaxies they 
saw via their telescopes and/or on photographic plates. New 
astronomers were trained to follow the classification rules 
and provided detailed morphologies for thousands of galaxies. 
Several large catalogues of nearby galaxies with such classifi- 
cations exists (e.g. The Hubble Atlas of Galaxies (Sandage] 
19611 1, or t he Third Reference Catalog ue of Bright Galax- 
ies (RC3), ( |de Vaucouleurs et al.| |1991')), and many of these 
classifications are collected in the NASA/IPAC Extragalactic 
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Fig . 1 . — The Hubble Tuning fork illustrated with the type of SDSS colour 
images used in Galaxy Zoo. Credit: Karen Masters and The Sloan Digital 
Sky Survey (SDSS) Collaboration, www.sdss.org. 

Databas£] 

The expert classifier approach quickly became inappropri- 
ate with the digital surveys because of the size of the galaxy 
samples available; for example, the Main Galaxy Sample of 
the Sl oan Digital Sky Survey (SDSS; ]^!!: et al.|20 00; Strauss' 
|et al.||2002 ) is over a million galaxies and simply cannot be 
visually inspected by any one astronomer (or even all the as- 
tronomers in the world working together). It became clear that 
some automatic method for classifying galaxies was needed, 
but programming a computer to recognise the complexities 
of galaxy shapes (spiral arms, bars, disk plus bulges) is very 
challenging. 

The first attempts at an automated classification scheme in- 



cluded the use of artificial neural networks, e.g. Lahav et al. 
(11995') began this work by comparing the classifications of 
830 galaxies from a set of six independent experts (R. Buta, 
H. Corwin, G. de Vaucouleurs, A. Dressier, J. Huchra and S. 
van den Bergh). These experts were unanimous in their clas- 
sification in only 1 % of objects, while an agreement of 80% 
between the experts could only be achieved within a spread 
of two T-typeaJ The conclusion was that the visual classifi- 
cations depended on the colour, size and quality of the image 
used, and that artificial neural networks could be developed to 
agree to almost the same degree as any pair of expert classi- 
fiers. This approach was implemented on the SDSS sample 



by [Ball et al. (2004) with the same basic conclusion as Lahav] 
et al. ( T995j ), i.e., that a neural network could reproduce visual 
morphologies within about 1 T-type. 

An alternative approach to developing methods to repli- 
cate human classification is to design computational algo- 
rithms that attempt to capture the same information. Exam- 
ples of this approach include the CAS (co ncentration , asym - 
metry, dumpiness) structural system of Conselice ( 2006| l, 
which uses a principal-component analysis (PCA) to study 
the diversity of internal structures of a sample of galaxies, 
and the ZEST (Zurich Estimator of Structural Types) algo- 
rithm of Scarlata et al.| ( |2007) , which uses a combination of 
diagnostics of the galaxy shape (that can be measured directly 
from the galaxy images) and the more traditional Sersic index 
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from the fit to the two-dimensional surface brightness distri- 
bution of the galaxy. Such data-orientated methods are very 
successful at capturing the complexity of galaxy shapes in 
two-dimensional images but remain hard to translate in terms 
of the more traditional, established morphological classes dis- 
cussed above. 

In recent years, there has been significant interest in the 
development of model-based morphologies of galaxies that 
use established parametric models for the light distribution 
of galaxies to fit to the two-dimensional images of galaxies. 
Su ch methods includ e GIM2D (Simard et al. 2002) and Gal- 
fit ( [Peng et al.|2002 1 which both attempt to fit galaxy images 
with a combination of a disk and bulge model. These can 
be used to construct an objective classification scheme such 
as the bulge-to-disk ratio of galaxies. Unfortunately, these 
model-fitting techniques are computationally intense and sub- 
ject to local minima as they search the high-dimensional pa- 
rameter space for the best fitting model (in some cases, it can 
be a 12-parameter model being fit to the galaxy images). 

A quicker way to solve the classification problem is to use 
a proxy for the galaxy class. The most common such proxy 
is the colour of a galaxy as most ellipticals are "red", with 
their light dominated by older stars, while most spirals are 
"blue", as they contain areas of active star formation which 
include luminous blue stars. However, relying on galaxy clas- 
sification via colours as a proxy misses an important piece of 
the galaxy evolution story. The colour of a galaxy is driven 
by the stellar (and gas and dust) content of the galaxy, while 
the shape or morphology of a galaxy reflects its dynamical 
history which could be very different (and have a different 
timescale). Therefore, one of the central motivations for the 
original Galaxy Zoo project was to construct a large sample 
of early and late type galaxy classifications that were indepen- 
dent of colour 

2. GENESIS OF GALAXY ZOO 

The Galaxy Zoo project was inspired by discussions of the 
limitations of a samp le of early-type galaxies produced by 
'Bernardi et al.'(2003) from th e initial Sloan Digital Sky Sur- 
ve>0data (York et al. 2000). B ernardi et aL| ( |2003| l had used 
a PCA-based classification to select "passive" galaxies based 
on the spectra of SDSS galaxies. Although this classification 
scheme was fast, and easy to implement, it probably excluded 
early-type galaxies that had signatures of on-going star forma- 
tion. It was therefore realised that to find such objects would 
require a sample of early-type galaxies based solely on their 
morphological visual appearance, without the use of spectral 
or colour information, i.e., that could include normal, passive 
"red" early-types, as well as the possibility of "bluer" star- 
forming early-types. Kevin Schawinski, as part of his PhD 
thesis work at Oxford University, under the supervision of 
Daniel Thomas, took on the task to build such a complete 
sample of early-type galaxies based solely on their visual ap- 
pearance and started by inspecting 50,000 SDSS galaxies 
to create the Morphologically Selected Ellipticals in SDSS 
(MOSES) sample; an order of magnitude more than any vi- 
sually inspected sample created to that point. The MOSES 
sampl e has resulted in a number o f interesting resul ts (e.g. 
.Schawinski et al.|2007a|bi ,2009b ; Thomas et al.|20T0l ) and in 
particular shows that there is a significant fraction of early- 
type galaxies that show recent star-formation activity. 

The experience with MOSES proved the need for in- 
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dependent morphological classifications for galaxies, while 
also demonstrating that scaling the MOSES methodology 
to all SDSS galaxies was unfeasible for a small number of 
researchers to manage. At this point, Kevin Schawinski 
and Chris Lintott (a researcher at Oxford also involved in 
MOSES) became motivated to find a way to visually clas- 
sify all SDSS galaxies in a reasonable amount of time, thus 
creating the initial "Galaxy Zoo" concept. They concluded 
that the only reasonable way to approach this problem was to 
"outsource" the visual inspection task and put it on the inter- 
net inviting volunteers to participate. At the time, the Star- 
dust@Home Projecrlwas using the internet to recruit volun- 
teers to identify tracks made by interstellar dust in samples 
that were flown on NASA's Stardust sam ple-return mission 
to Comet Wild-2 ( [Westphal et al.||2006| l. Stardust@Home 
had ~ 20,000 volunteers, and by ;xtrapolation, Lintott and 
Schawinski figured that if even one quarter of 20,000 volun- 
teers did one galaxy classification per day, the full SDSS Main 
Galaxy Sample (approximately a million galaxies) could have 
secure galaxy classifications in three years (assuming each 
galaxy was visually inspected five times each). 

At the same time, another researcher at Oxford, Kate Land 
was planning a similar interface to classify, and characterise 
the sense of rotation of spiral galaxies. She was interested in 
an article that suggested there was a correlation between the 
"handedness" of spiral arms in the SDSS disk galaxies and 
their position on the sky, i.e., that the direction of the rotation 
of disk galaxies did not appear to be random (Longo 2007). 
Land had planned to build an interface on a laptop computer 
and then place it in the canteen of the Oxford Physics Depart- 
ment, hoping to enlist the help of her fellow scientists. How- 
ever after a fortuitous meeting of the two groups, it became 
clear that the projects could be merged into a single interface 
addressing both questions. 

Phil Murray and Dan Andreescu of Fingerprint Digital Me- 
dia were recruited to design the Galaxy Zoo website and the 
initial success of Galaxy Zoo can probably be credited to 
the visual appeal and ease-of-use of the interface design, 
combined with a relatively easy classification scheme. The 
user was asked if the galaxy image they saw was "Spiral" 
or "Ellipical", followed by the classification of the apparent 
spin direction of the spiral arms (clockwise or anticlockwise). 
Another key factor was that people could get started right 
away after a relatively short tutorial. Once a user had passed 
the tutorial, they were free to classify as many galaxies as 
they wished and could login and out of their account as they 
wished. The original Galaxy Zoo team along with Land, Lin- 
tott and Schawinski included experts from the SDSS (Alex 
Szalay, Bob Nichol, Steven Bamford, Anze Slozar) and the 
MOSES team (Daniel Thomas), as well as experts in astron- 
omy outreach (Jordan Raddick) and data archives (Jan van 
den Berg). 

Galaxy Zoo was launched on July 11, 2007 and introduced 
in a BBC online article that same da}|^ In the first three hours 
after launch, classifications were coming in at such a high 
rate that the data servers located at Johns Hopkins University 
hosting the site and SDSS images were unable to meet the 
demand. Fortunately, additional capacity was brought online 
quickly and, within twelve hours of the launch, the Galaxy 
Zoo site was receiving 20,000 classification per hour. After 



forty hours, the classification rate had increased to 60,000 per 
hour. After ten days, the public had submitted ^ 8 million 
classifications. By April 2008, wh en the Galaxy Zoo team 
submitted their first paper ( [Lintott e t al. 2008), over 100,000 
volunteers had classified each of the ^ 900,000 SDSS galaxy 
images an average of 38 times. 

One of the unforeseen consequences of the Galaxy Zoo 
launch was the avalanche of email the team received from 
the public. Within two weeks, the original Galaxy Zoo team 
was swamped with requests for information and queries, and 
several additional people were recruited to help manage these 
requests. This need to communicate inspired the creation of 
a Galaxy Zoo internet forum which encouraged the Galaxy 
Zoo users to communicate with each other (overseen by the 
Galaxy Zoo team). This allowed many of the basic queries 
from the public to be answered by other members of the pub- 
lic more experienced with Galaxy Zoo, and also allowed the 
volunteers (who named themselves "Zooites") to share their 
thoughts and ideas with each other. Once the forum was es- 
tablished, several members of the public ("citizen scientists") 
quickly volunteered to moderate the forum and began to gen- 
erate a variety of discussion threads which included basic help 
with understanding astronomy and Galaxy Zoo, and a reposi- 
tory for "weird and wonderful" images people found. 

In addition to the forum, in December 2007 the team began 
to communicate with the volunteers through a series of blog 
messages about the progress of the project and science]^ 

3. GALAXY ZOO 1 

As described above, the first phase of Galaxy Zoo (now 
known as "Galaxy Zoo 1" or GZl) asked volunteers to pro- 
vide only basic morphological information on each galaxy. 
They were asked to identify if a galaxy was "spiral", "ellipti- 
cal", "a merger" or "star/don't know" and additionally split the 
spiral category into "clockwise", "anticlockwise" and "edge- 
on/don't know". Galaxies for the GZl project were drawn 
from the Main Galaxy Sample of the sixth SDSS Data Re- 
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lease ( [Strauss et aL]|2002[ [Adelman-McCarthy et al.||2008 i 
and comprised all extended objects in the survey that were 
brighter than a Petrosian magnitude of r < 11.17 mag. All 
objects were included, whether or not they had an SDSS spec- 
trum, giving a total of 893,212 images. 

3.1. From Clicks to Classifications 

The Galaxy Zoo project was extremely successful in re- 
cruiting volunteer classifiers thus providing each galaxy in the 
sample with multiple independent classifications; GZl has a 
mean of 38 classifications per galaxy, with at least 20 classi- 
fications for all galaxy. Most previous morphological classi- 
fications had been done by single experts (or small groups of 
experts) agreeing on a single answer, but in Galaxy Zoo the 
situation was more like a "vote" on the galaxy classification. 
Going from these votes, or "clicks", to classifications can be 
done in several ways. 

The first step in processing the user-generated data was to 
"clean" them by removing the tiny fraction of potentially ma- 
licious users, and any chance multiple classifications of a sin- 
gle galaxy by a given classifier Next, there was a decision 
about how much weight each vote should have. The simplest 
choice is to give all classifiers equal weight. This gives a dis- 
tribution of classifications for a galaxy which encodes infor- 
mation about the most likely classification as well as some 
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measure of how certain that is (in the spread of classifica- 
tions). 

In GZl a weighting scheme was also explored which 
weighted users based on how well they agreed with the ma- 
jority (in practice this was applied iteratively). This was an 
attempt to give more weight to "better" classifiers, where 
"better" was defined as agreeing with the majority. These 
"weighted" classifications for the most part were similar to 
unweighted classifications. 

3.1.1. Classification Biases 

Several bias studies were run in the original GZl to test 
the effect of the interface and types of images shown to the 
volunteers on the classifications which were entered. The two 
main goals of the bias studies were to: (1) test the effect of 
using colour images for the classifications, and (2) test if users 
could reliably identify the sense of the spiral arm winding. To 
achieve this, a small number of monochrome and mirrored 
images were added to the GZl sample and the clicks on these 
images were compared to the original, unperturbed images. 

Interestingly, a change in the behaviour of the volunteer 
classifiers was witnessed during these bias testing exercises, 
in the sense that users appeared to be more careful in their 
classifications during bias testing periods. Therefore, only 
clicks collected on the original images at the same time as 
the tests were being carried out could be used for the compar- 
ison between classifications. The results of the monochrome 
bias test showed that there were only small differences in the 
galaxy classifications between colour and black-and-white 
images. Users were slightly more likely to classify objects 
as "elliptical" in monochrome images; 56% of the votes went 
to ellipticals in the monochrome images compared to 55% in 
the original colour SDSS images. 

The results of the mirror image bias testing are discussed 



extensively in [Land et aL] ( |2008 1. They showed a significant 
bias in favour of anti-clockwise direction arms (in both the 
original and mirrored images). The interpretation of this bias 
could be due to psychological effects (possibly related to the 
preference for right handedness amongst the population), or 
possibly site design (it being easier to click the anti-clockwise 
button for example). However, once this bias was corrected 
for, the data could still be used (see below). 

Finally, another source of bias in the GZ classifications has 
to do with the distance to the observed galaxies. We expect 
that at some distance, features become harder to resolve, and 
more galaxies will be classified as ellipt icals. This effect was 
indeed found in the GZl sample by Bamf ord et al.| ( |2009| l 
where a correction was derived as a function of redshift and 
galaxy luminosity. The conclusion was that the GZl classifi- 
cations are reliable, and the bias correction is small, for red- 
shifts below z < 0.08, but at higher redshifts there is a strong 
trend for galaxies to be classified preferentially as elliptical. 

3.1.2. Comparison with Other Classifications 

In |Lintott et al.|p008| l, the GZl classifications were com- 
pared against three sets of independent galaxy classifications. 
These included early-type galaxies in the MOSES sample 
( |Schawinski"eral 2007b) , a set of 2275 SDSS galaxies of 
all galaxy types cliassihed byjFukugita et al. (2007), and the 
sample of 283 4 visually identified SDSS spiral galaxies from 
Longo (2007 1. In all cases, GZl classifications were found to 
agree remarkably well (better than 90% of the time in most 
cases), and the conclusion was that using data from volun- 



teers did not substantially degrade the quality of classifica- 
tions, while expanding the number of classified galaxies by 
a large factor and additionally reducing the scope for human 
error introducing erroneous classifications. 

3.2. Science Results from Galaxy Zoo 1 

Classifications from GZl have been used for a wide range 
of galaxy evolution studies. A full list of the peer-reviewed 
papers coming from within the Galaxy Zoo 1 team is pro- 
vided in Table [T] We review some of these science results 
here and stress t hat the data from GZl is now publicly avail- 
able (Lintott et al. 201 I n and being used by several scientists 
beyond the original GZl team. For example. Galaxy Zoo 1 
w as used to remove late-type contam inants from the study 
of Trujillo, Ferreras, and de la Rosa| ( |201 1| ), and was com- 
pared against a new me t hod fo r automated classification in 
Huertas-Company et al. (2011 1. Moreover, the Galaxy Zoo 
1 classifications have no w been included in the Eighth Data 
Release of the SDSS (see |Aihara et al.|201 1[ ) and can be elec- 
tronically accessed alongside other SDSS galaxy parameters 
in their Catalog Archive Server (CAS)j^ 

3.2.1. Colour and Morphology 

The greatest legacy from GZl has been the decoupling of 
colour and morphology with high statistical significance. We 
have demonstrated that 80% of galaxies follow the expected 
correlations between colour and morphology, i.e., either "red" 
early-type galaxies or "blue" spiral galaxies. Therefore, for a 
majority of galaxies, colour can be used as a crude proxy for 
morphology. However, GZl also shows that there is a signif- 
icant numbers of red (passive) spiral galaxies and blue early- 
type galaxies. These interesting sub-populations of galaxies 
have been explored in a number of GZ papers (see Table[T]i. 

This disentangling of morphology and colour has been used 
to study the separate dependences of the properties on en- 
vironment and provide evidence that the transformation of 
galaxies from "blue" to "red" proceeds faste r than the trans- 
for mation from spiral to early-type (see Bamford et al.| ( p009) 
and Skibba et al.| ( |2009} which use different methods to quan- 
tify uus"eiTect)7Tneproperties of the "blue " early-type galax- 
ies in G alaxy Zoo have been studied by Schawinski et al. 
(2009 a| and "red" (passive ) spirals has been explored further 
by iMasters et al.| ( |2010albl l. 

3.2.2. Spiral Arm Directions 

The clockwise/anti-clockwise classifications of the spiral 
galaxies have been used to show that (as expected from the 
cosmological principle) there is no evidence for a preferred 
rotation direction in the universe, but that huma ns preferen- 
tially classify spiral galaxies as anti-clockwise (Land et al. 
2008| l; and hint at a local correlation of galaxy spins at dis- 
tances less than ^ 0.5 Mpc - the first ex perimental evidence 
for chiral correlation of spins (jSlosaret al. 2009). Intriguingly 
there are also hints of a correlation between star formation 
history and spin alignments ( ,Jimenez et al.|2010j ). 

3.2.3. Merging Galaxies 

The sample of merging galaxies has been used to show that 
the local fraction of mergers is about 1 -3% and to study th e 
global properties of merging galaxies ( |Darg et aL]|201()a|bl ). 
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Multi-mergers (where more than two galaxies are merging at 
once) - which are mu ch rarer than bina ry mergers have also 
recently been studied ( Darg et al.|201 1 1. 



TABLE 1 

Peer reviewed papers based on classifications collected in 

THE FIRST PHASE OF GaLAXY ZOO (IN ORDER OF PUBLICATION). 



3.2.4. Active Galaxies 

The GZl classifications also revealed interesting correla- 
tions between galaxy morphology and black hole growth. By 
splitting both the normal galaxy population and the active 
galaxy population by morphology, two fundamentally dif- 
ferent modes of black hole feedi ng and feedback in early - 
and late-type galaxies were found ( [Schawinski et al.|2010b| l. 
Early-type active galactic nucleus (AGN) host galaxies are 
systematically lower mass and bluer than the general early- 
type population. Black hole growth is concentrated strongly 
in the "green valley" between the blue cloud and the low-mass 
end of the red sequence. These early-type AGN host galaxies 
furthermore feature stron g post-starburst stellar populations 
( Schawinski et al. 2007b| l and thus are migrating from the blue 
cloud to passive evolution at the low mass end of the red se- 
quence - they are thus building up the red sequence today. 

Late-type AGN host galaxies dominate by number (up to 
90% if "indeterminate" are included) and reside predomi- 
nantly in massive host galaxies with no indications of recent 
suppression of star formation. Black hole growth in these 
disk-dominated galaxies is likely stochastic and has no sig- 
nificant connection to the evolutionary trajectory of the host 
galaxy. Intriguingly, the Milky Way galaxy resides in the lo- 
cus of mass and colour where black hole growth is most likely, 
potentially making the Milky Way and Sagittarius A* a proto- 
type for this "secular" mode of black hole feeding in late-type 
galaxies. 



3.2.5. Rare and Unusual Objects 

GZl has brought to light several rare classes of object. 
"Hanny's Voorwerp" is perhaps the most famous of such ob- 
jects and many are familiar with the story of the Dutch school 
teacher Hanny, who first noted this object (she was not the 
first volunteer to see it, but the first to ask about it) which is 
now memorialized in a Comic Booll*"] The Voorwerp is an 
unusual emission line nebula neighbouring the spiral galaxy 
IC 2 497 and has been studied in several follow-up projects 
(e.g. IL intott et al.|[2009l [Rampadarath et al.|[20T0l |Schawin-| 
|ski et al. 2010a), and also features in much of the education 
material from Galaxy Zoo. 

Another unusual class of objects discovered by the Galaxy 
Zoo volunteers are the "Green Peas". The properties of these 
emission-line galaxies, which appear green in the SDSS com- 
posite gri colour images because of their strong [QU I] emis- 
sion, are studied in detail in jCardamone et aL] ( |2009| l. 

4. EVOLUTION OF GALAXY ZOO 

4. 1 . Galaxy Zoo 2 and Hubble Zoo 

As the original Galaxy Zoo was the first time such a project 
had been attempted, the Galaxy Zoo team was cautious with 
their classification scheme, only asking for simple informa- 
tion about the appearance of the galaxies. Thanks to the over- 
whelming response, and prompted by requests from the vol- 
unteers who wanted to provide more detailed classifications, 
the team realized they could harvest much more information 
from the SDSS images than in GZl. Therefore, Galaxy Zoo 
2 (GZ2) was designed around asking more detailed questions 



Author & Year 



Title - Galaxy Zoo: 



Kate Land et al. 2008 

Chris Lintott et al. 2008 

Anze Slosar et al. 2009 
Steven Bamford et al. 2009 

Kevin Schawinski et al. 2009a 
Chris Lintott et al. 2009 
Ramin Skibba et al. 2009 

Carle Cardamone et al. 2009 

Danny Darg et al. 2010a 

Danny Darg et al. 2010b 



Kevin Schawinski et al. 2010a 
Kevin Schawinski et al. 2010b 



Karen Masters et al. 2010a 
Raul Jimenez et al. 2010 

Karen Masters et al. 2010b 
Manda Banerji et al, 2010 
Chris Lintott et al 2011 

O. Ivy Wong etal. 2011 

Daniel Darg et al. 2011 



The large-scale spin statistics of spiral galaxies 

in the Sloan Digital Sky Survey 

Morphologies derived from visual inspection of 

galaxies from the SDSS 

Chiral correlation function of galaxy spins 

The dependence of morphology and colour 

on environment 

A sample of blue early-type galaxies at low redshift 

'Hanny's Voorwerp', a quasar light echo? 

Disentangling the environmental dependence of 

morphology and colour 

Green Peas: discovery of a class of compact 

extremely star-forming galaxies 

The fraction of merging galaxies in the SDSS and 

their morphologies 

The properties of merging galaxies in the nearby 
Universe - local environments, colours, masses, 
star formation rates and AGN activity 
The Sudden Death of the Nearest Quasar 
The Fundamentally Different Co-Evolution of 
Supermassive Black Holes and Their Early- and 
Late-Type Host Galaxies 
Dust in spiral galaxies 

A correlation between the coherence of galaxy spin 
chirality and star formation efficiency 
Passive red spirals 

Reproducing galaxy moiphologies via machine learning 
Data Release of Morphological Classifications for 
nearly 900,000 galaxies 

Building the low-mass end of the red sequence with 

local post-starburst galaxies 

Multi-Mergers and the Millennium Simulation 



about the ^ 250,000 brightest SDSS galaxies from the orig- 
inal GZl sample of galaxies. Once again, the response 
was tremendous and in the fourteen months the site was live. 
Galaxy Zoo 2 users provided over 60 million classifications. 
Along the way, deeper SDSS images were added for a sub- 
set of GZ2 galaxies, taken from a patch of the sky known as 
"Stripe 82" which allows fainter structures in these galaxies 
to be visible. 

The first sc ience results from GZ 2 classifications are now 
appearing. In Masters et al. ( 201 l| l, we showed that the frac- 
tion of barred disk galaxies (as compared to unbarred galax- 
ies) depends on other galaxy properties, especially the overall 
colour of the galaxy and the size of the central bulge. As 
a satellite project, Ben Hoyle at Portsmouth University de- 
veloped an additional web interface using Google Maps tech- 
nologies to allow GZ2 volunteers to draw the shapes and size s 
of bars on GZ2-selected disk galaxies (JHoyle et al.||2011|). 
From September 2009 to January 2010, he received 16,551 
bar drawings for 8180 galaxies, making it by far the largest 
sample of disk galaxies with known bar lengths; again demon- 
strating the attraction of Galaxy Zoo even for such a complex 
task. These studies combined show the strong connection 
between the bar of a disk galaxy and its overall colour, i.e., 
disk galaxies with long bars also exhibit prominent bulges and 
have redder colours than galaxies with smaller bars. 

After Galaxy Zoo 2, the team launched "Hubble Zoo". To 
really understand galaxy evolution, and to get a sense of how 



see http : //hannysvoorwerp . zooniverse . org/ 



' ' The website ^http : //zoo2 . galaxyzoo . org/) for this phase of 
Galaxy Zoo was designed by Phil Murray and implemented by Danny Lock- 
smith and Arfon Smith. 
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the colour-morphology relation might change over time, it is 
important to be able to classify morphologies for galaxies that 
are much further away than those classified from the SDSS. 
The light from these galaxies has taken much longer to get 
to us and hence provide images of galaxies at a much earlier 
epoch in the history of the universe. Such a dataset will allow 
us to answer questions like: Are there more blue ellipticals 
compared to red ellipticals earlier on in the Universe? Does 
the number of irregularly shaped galaxies increase as we look 
back further in time? To compare the results from the GZl 
and GZ2 classifications of the SDSS galaxies to galaxies at 
an earlier epoch, the latest incarnation of Galaxy Zoo is us- 
ing data from the Hubble Space Telescope (HST) which goes 
deeper than ever before, e.g., HST COSMOS (Cosmic Evolu- 
tion Survey) has over two million galaxies that cover 75% of 
the age of the universe (Scoville et al. 2007 ). Hubble Zoo 
is currently undergoing classification using HST data from 
GEMS [Rix et al. ,2004), G OODS (Giavalisco et al.,2004.), 
AEGI S ( |Davis et al.||2607| l and COSMOS ( |Scoville et alT] 



2007) . The decision tree is identical to that for GZ2 except 
that there is an additional branch that classifies the "dumpi- 
ness" and symmetry of each galaxy. 

4.2. The Citizen Scientists - Motivation and Unexpected 
Outcomes 

Within the first several days after launch, it was clear to the 
Galaxy Zoo team that they had hit a nerve with the public - 
classifying galaxies on Galaxy Zoo provided some sort of ful- 
filment for the volunteers. GZ team members suspected that 
the popularity of the project relied on the beauty of the im- 
ages, or that the project had benefited from particularly good 
and lucky publicity. Already, team members were thinking 
of other scholarly areas where applying the method of visu- 
ally inspecting data could lead to publishable results beyond 
what could be accomplished by application of machine al- 
gorithms. But before any such steps could be taken, it was 
essential to understand the motivations for volunteers partici- 
pating in Galaxy Zoo. A survey of the motivation s of citizen 
scientists involved in Galaxy Zoo is presented in Raddick et 
[al., ( 2010 ). The results show that by far the most common mo- 
tivation Galaxy Zoo volunteers cite for their involvement in 
the project is their desire to contribute to real scientific work. 

Thus it should not have come as a surprise that many 
Galaxy Zoo volunteers developed their own lines of inquiry 
off the main task page. The Galaxy Zoo forum acted as a 
clearing house for volunteers to describe and discuss objects 
that they felt were noteworthy. Several threads were devoted 
to collecting objects with specific characteristics, e.g. triple 
mergers, or "overlapping" galaxies, or small, round, green 
galaxies dubbed "Peas". These three examples have all re- 
sulted in scientific pa pers (Da rg et al.|2011 Keel et al.|201 1 



Cardamone et al.|2009, respectively). 



But one of the critical aspects enabling the development of 
collections and further inquiry into an object's characteristics 
was the link from the main task page for each object to the 
SDSS SkyServer Object Explorer paga^ This page aggre- 
gated information about the galaxy including the image and 
accompanying spectrum as well as information about its mag- 
nitude, redshift, cross-identifications in other wavelengths and 
a host of links to more information such as NASA's Extra- 
galactic Databasa^ It is through the Object Explorer page 



that Galaxy Zoo volunteers began to notice that the "Peas" 
all had extraordinarily high fluxes in the [OIII] emission line. 
Eventually over 250 of these objects were found while volun- 
teers taught each other through the forum what the character- 
istics of a "pea" were and began to trade literature searches on 
what [OIII] meant and possible interpretations of these galax- 
ies. After several months collecting and interpreting on their 
own, a graduate student from Yale, Carie Cardamone was as- 
signed to moderate the "Peas" forum, working with the volun- 
teers while she developed the full analysis of these rare dwarf 
galaxies wit h extrem ely high star formation rates (which was 
published as [Cardamo ne et al. 2009). 

The story of the Galaxy Zoo "Peas" inspired the team to 
ensure that future projects provide links to supporting infor- 
mation and analysis tools related to the objects shown in the 
primary task. This is to enable the users to conduct their own 
research and allows for users to learn the process of research 
aided by peer-mentoring. 

The experience with the Galaxy Zoo forums and blogs 
shows that the citizen scientist volunteers wanted to do much 
more than classify objects. They built a community of the 
volunteers, by the volunteers and for the volunteers. Indeed, 
Galaxy Zoo belonged to the volunteers - it was their time just 
as much as it was the scientists time spent working on the 
project. The team understood this important fact and made 
it a point of principle to keep the volunteers informed about 
various aspects of the project from the technical to the social 
and scientific. Moreover, the team realised early on that they 
must respect the time and commitment of the volunteers, and 
should only harvest classifications for as long as they were 
scientifically useful. 

In fact the volunteers have set up several projects of 
their own using Galaxy Zoo infrastructure or methods. The 
largest example of such a project is probably the "Irregulars" 
project. Initiated by Galaxy Zoo volunteers Richard Proc- 
tor ("Waveney") and Julia Wilkinson ("Jules") on a forum 
thread^ the aim of this project was initially to collect a sam- 
ple ofuregular galaxies, i.e. galaxies that did not fit in to the 
classification scheme at all. This project now uses a self built 
web interface (similar in style to Galaxy Zoo|^ to ask for 
classifications of the objects and has inspired several volun- 
teer led research papers. Richard Proctor has recently applied 
to do a part-time PhD at the Open University using the data 
collected in this project. 

The Galaxy Zoo forurrp^ has been a scientific gold mine 
on several occasions. Examples of science results coming di- 
rectly fromtheforum include the discovery of the Voorwerp 
(Lint ott et al.||2009] l, targeted and serendipitous searches for 



smaller versio ns"oithe Voorwerp ( "voorwerpjes" Chojnowski 
1^ Keel 2011} IGagne et al^|2otT| , flie Peas (Cardamone et 
aT. 2009), overlaps (Keel et al. 2011), ring galaxies, etc. The 
depth of interest shown by some of the volunteers is extraordi- 
nary. Volunteer, Richard Proctor ("waveney") has set up web 
forms for several searches and sample evaluations (including 
the "Inregulars" project mentioned above). Massimo Mezzo- 
prete ("Half65") was so interested in the overlapping -galaxy 
search (Keel et al. 2011) that he learned SQL and perl, cre- 
ating a tool that he could point to a forum thread and have it 
parse for either kind of unique SDSS Object ID, then query 
the Catalog Archive Server and create a PDF with a page of 



http : / / skyserver . sdss . orq/ dr 8/ en /tools/explore/ ob j 



http : / /nedwww . ipac . caltech . edu/ 



http : //www . galaxyzooforum. org/ index . php?topic=2734 10 . 


http : / /www . wavwebs . com/GZ/Irr 


egular/Hunt . cgi 


http : / /www . galaxyzooforum. org 
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finding chart, photometry and positional data for each object. 
(Mezzoprete is a co-author on the first overlapping-galaxy 
paper). These forum results clearly show that through the 
Galaxy Zoo project, citizen scientists have become research 
collaborators. 

5. THE ZOONIVERSE 

The extension of the Galaxy Zoo idea to other scientific 
domains is obvious in our data-rich world, especially given 
the desire of the public to be involved in scientific inves- 
tigations of these data. Researchers across a diverse range 
of^ academic fields face the common problem of developing 
new strategies and modes of computational thinking needed 
to transform this data flood into knowledge. With the current 
moderate-sized databases (terabytes), citizen science methods 
like Galaxy Zoo can replace some aspects of machine algo- 
rithms. However, as the data deluge will only intensify in 
the next decades, machine algorithms must advance to meet 
the data processing demands, incorporating techniques based 
on developing areas such as computer vision. Instead of dis- 
placing the citizen science method, these new algorithms will 
need to be trained from, and tested by, human input (e. g. GZl 
classifications have been used for machine learning in Banerji| 
let al. (2010)). Thus, the visual processing methods of Galaxy 
Zoo will become essential to f\illy extract information from 
the data. 

It is tempting to think of Galaxy Zoo purely as an Educa- 
tion and Outreach endeavor with all its successes in garner- 
ing publicity and focus on a community of non-expert volun- 
teers. And with that temptation, one might imagine applying 
the Galaxy Zoo method to an indiscriminate array of projects 
with the idea that the public would be engaged in the pro- 
cess so it does not matter if the scientific outputs were "real" 
or whether the data processing could have been better ac- 
complished through standard computational methods. What 
must be made clear is that Galaxy Zoo turned citizen science 
into a data processing method - a data reduction tool for data- 
intensive science which when applied correctly provides the 
best possible data product from a set of "raw" data. The ge- 
nius in this method lies in the fact that the public actually pre- 
fer to participate in a meaningful set of tasks where they know 
their work is useful. Galaxy Zoo established this coupling 
between high-priority science output and the public engage- 
ment in science. Once it became clear that the appetite of the 
volunteer classifiers could crunch significantly more data the 
question became one of how this new citizen science method 
could be made available across different disciplines and data 
products. And how to begin the process of developing the 
machine algorithms trained by the human classifiers. 

The first step in providing access to more, and varied, data- 
intensive projects was to aggregate individual citizen science 
projects onto a common web-based portal. Several objectives 
are met by establishing a centralized common entry point. 
First, a home base is provided for the volunteers so they 
can move with ease between projects. It encourages a sense 
of community as the same volunteers can share information 
about their work within, and across, projects. It builds con- 
fidence and "brand loyalty" allowing volunteers to become 
more willing to try new types of projects and progress in their 
learning. Second, aggregating projects allows for cyberinfras- 
tructure that can take advantage of cross-project efficiencies 
while retaining the flexibility to provide individualized tools 
for specific projects. For example, shared software makes de- 
velopment of different projects possible on a reasonably short 



timescale thus reacting quicker to new opportunities. Among 
other advantages provided, these factors can then reduce the 
overhead on recruiting volunteers and allows for the possibil- 
ity of deploying small and exploratory projects that would be 
prohibitive to create on their own. Building on the zoo aspect 
of the Galaxy Zoo brand, the "Zooniverse" became the answer 
to how to create a centralized portal to a universe of Zoo-like 
projects. 

To turn the "Zooniverse" into reality, several new projects 
with data sets beyond the SDSS were developed. In order to 
help manage the Zooniverse and its expanding set of projects, 
in June 2009 the Citizen Science AllianctE]' was formed ini- 
tially by Chris Lintott, Steven Bamford, Lucy Fortson and Ar- 
fon Smith. The Zooniverse Projecj^ website was launched in 
December 2009. 

To shift from the original Galaxy Zoo to the Zooniverse, 
substantial technical changes were implemented in order to 
produce a robust and flexible system. The most important 
change was the shift from hosting on a single server to host- 
ing in the "Cloud", i.e., making use of commercial services 
provided by Amazon Web Services. This technology allows 
new servers to be brought online in response to demand, and 
therefore allows the site to cope with spikes in internet traf- 
fic due to the fluctuating media coverage. The new system is 
built in a "Ruby on Rails" framework with a restful API layer 
between a thin web layer and the database. Authentication 
of users is carried out by an implementation of the Central 
Authentication Service (CAS) single sign-on solution. This 
technology allows volunteers to use the same account for both 
the forum and the main Galaxy Zoo site, as well as between 
different projects. The use of an API allows the Zooniverse 
team to support not only the main website but also iPhone 
and Android applications, allowing mobile users to take part 
in Galaxy Zoo. Early results suggest that this may be an effec- 
tive way of increasing the number of classifications per user. 

The Zooniverse codebase was designed with a flexible do- 
main model and extensible reuse of code. These attributes 
allow features developed for new projects to be useable by 
all projects. The use of cloud computing services provides 
hosting scalability, while the virtual platform also handles 
content distribution and asynchronous classification process- 
ing. As of early 2011, the "Zooniverse" is running eight Zoo 
projects and has handled many millions of classifications by 
more than 250,000 users. Several different task functions have 
been implemented through these projects including basic de- 
cision trees, drawing shapes on images ("MoonZoo", "Milky- 
Way" Project), real-time asset prioritization and alerts with 
the Galaxy Zoo Supernova project of Smith et al. (2011), ma- 
nipulating simulated data parameters (Galaxy Zoo Mergers) 
and text transcription ("Old Weather"). 

To aid in the development of the Zooniverse as a commu- 
nity of citizen scientists, and to enable users to engage in in- 
quiry related to the data for a given Zoo, a discussion tool 
was recently developed to replace the forum structure used 
in GZ. The new discussion tool (called "Talk") was launched 
with the Milky Way Project and encourages users to create 
collections of objects, share information and join in online 
discussions. Several social media features such as tagging, 
tag clouds, "trending" and "recent" toggles improve Talk over 
the older forum structure, while retaining the primary collabo- 
rative functions such as the discussion boards in Galaxy Zoo. 



see jhttp : //www, citizenscienc ealliance . orgj 
http : / /www . zooniverse . org 
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The Zooniverse team has already seen a marked increase in 
traffic to Talk compared to the number of users navigating to 
the old forum structure. 

The Zooniverse team also has developed numerous educa- 
tion resources and continues to conduct education research 
into the motivations of the volunteers to contribute to tasks, 
how usage patterns vary over different levels of engagement 
with the project and whether there is any gain in understand- 
ing the process of research - just to name a few of the topics. 
Further description of these efforts is outside the scope of this 
paper. 

5.1. Tasks suitable for the Zooniverse 

One of the difficulties for the Zooniverse is understanding 
the types of tasks that are suitable for citizen scientists. The 
original Galaxy Zoo project primarily asked users to classify 
images. The interface was simple, with only a few buttons 
to click for every image. Some of the early success of this 
project might have been due to the simple requirements of 
this task. 

The newer Galaxy Zoo 2, Hubble Zoo and Galaxy Zoo Su- 
pernova project are also based on having volunteers do clas- 
sifications on images. However, these projects use a context- 
based decision tree to ask more detailed morphological ques- 
tions about the objects, rather than just using a single classi- 
fication of an object. If the galaxy was a spiral, does it have 
a bar at the center? How many spiral arms are visible? Al- 
though each question only has a few possible answers, the 
data for each object has more detail than would be possible 
from a single question interface. 

The GZ "Mergers" project operates in a fundamentally dif- 
ferent way than the others Zoos. Users are not asked to clas- 
sify images, but rather to match simulated images to data. In 
some cases, none of the simulations presented are similar to 
the target galaxy. In other cases, there are several selections 
possible within the main interface. After selecting an image, 
users then have the option of enhancing it using a Java applet. 
By using two dimensional sliders, the users can generate new 
simulations to try to make their results match the target image 
more closely. The overall approach of this project has some 
similarities to on-line citizen science games like "Fold-It!" ^ 
The primary difference is the lack of an objective score for the 
goodness-of-fit. In the case of mergers, we do not have such 
an objective fitness function, as one of the goals of the project 
is to create a sufficiently large sample of galaxy mergers that 
such a function could be derived. The users therefore have to 
use their best judgement to determine the goodness-of-fit. 

When the GZ "Mergers" project first started, there were a 
large number of images that were being viewed every day. Of 
the images being viewed, approximately 5% were selected by 
the volunteers as possible matches to the target galaxy. Upon 
inspection, the science team found that a high fraction (up 
to 95%) of the simulations were not likely matches to the 
real galaxies, as it appeared that many simulations were in- 
advertently selected, or inexperienced users tried to select too 
many simulated galaxies as matches. To increase the frac- 
tion of good matches, the team created a second level inter- 
face called "Merger Wars". In this interface, volunteers were 
given the opportunity to select the best of the simulation im- 
ages by allowing the full suite of simulated images to compete 
with each other in one-to-one competitions, e.g., users were 
shown only two simulations at a time, and asked to pick the 



best one, and then iterate. Although some of this analysis 
is still underway, the science team believes that the selection 
rate for good matches has dramatically improved. A larger 
fraction of the originally selected simulations get zero votes 
in this second level competition. 

In the Planet Huntera^j site, users are not looking at im- 
ages, but rather are presented with time series data on the 
light curves of nearby stars, and are tutored on how to recog- 
nize the signature of an extrasolar planet in that data. Despite 
the seemingly esoteric nature of light curve data this Zooin- 
verse project has been very successful, proving that citizen 
scientists are happy to deal with more complex types of data 
possibly because the scope for discovery is high. 

In addition to classification and matching, citizen scien- 
tists are also being asked to do measurements on images. In 
the "Solar Storm Watch", "Moon Zoo", and "Milky Way" 
projects, volunteers use drawing tools to identify features. In 
the "Moon Zoo", for example, volunteers are asked to draw 
circles around the rims of craters. A similar process is used 
to identify bubbles in the interstellar medium in the "Milky 
Way" project. 

In some ways, these last projects require more advanced 
skills and more patience than just clicking through a classifi- 
cation tree. However, with the right interface and the right 
users, very good results can be obtained on these types of 
projects. 

A key observation from all of these Zooniverse projects, 
and from the forums and Talk interactions, is that some of the 
volunteers have very advanced abilities and interests. There 
is a great deal of effort being dedicated to develop a suite of 
tools that allow these users to do additional scientific inves- 
tigations on their own and, as discussed above, some of the 
most interesting discoveries come from the users themselves. 

5.2. Data Mining the Zooniverse Results 

One of the key features of the Zooniverse project is the ap- 
plication of machine learning (data mining) algorithms to the 
Zooniverse volunteer-contributed tags. These tag data them- 
selves generate a significant volume of data (e.g., the many 
hundreds of millions of galaxy classifications from Galaxy 
Zoo). Finding correlations and trends among these user- 
contributed tags alongside automatically measured parame- 
ters of the same objects within the science database (e.g., 
the SDSS object catalog) will enable the development of im- 
proved classification and anomaly-detection algorithms for 
future sky surveys (such as the Large Synoptic Survey Tele- 
scope (LSST)), which will measure properties for at least 100 
times more galaxies, 100 times more stars, and 100 thousand 
times more source observations. 

For example, a preliminary study of the galaxy mergers 
found in th e Galaxy Zoo I project was carried out (Baehr 
et al.||20l"0) l. It was found that certain science database pa- 
rameters m the SDSS science database correlated strongly 
with how often Galaxy Zoo users identified an object as 
a merger. These database attributes included: (a) the log- 
likelihood that the galaxy's surface brightness profile was 
fit neither by an exponential disk (the lnLExp_u attribute 
in the PhotoObjAll table) that is typical of spiral/disk 
galaxies nor by a de Vaucouleurs profile (the lnLDeV_u 
attribute in the PhotoObjAll table) that is typical of el- 
liptical galaxies; (b) a gradient in the position angle of the 



http://fold.it/portal/ 
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isophotal major axis of the galaxy (the isoAGrad_u at- 
tribute in the PhotoObjAll table up to Data Release 7); 
and (c) the galaxy's "texture" (the texture_u attribute 
in the PhotoObjAll table up to Data Release 7), which 
is essentially the RMS (root-mean-square) variation of the 
galaxy's surface brightness profile relative to one of the stan- 
dard galaxy profile-fitting functions. In hindsight, it could 
have been predicted that these parameters would be useful in 
distinguishing normal (undisturbed) galaxies from abnormal 
(merging, colliding, interacting, disturbed) galaxies. These 
results may now be applied to future sky surveys, to improve 
the automatic (machine-based) classification algorithms for 
colliding and merging galaxies. All of this was made possible 
by the fact that the galaxy classifications provided by Galaxy 
Zoo I participants led to the creation of the largest pure set 
of visually identified colliding and merging galaxies yet to be 
compiled for use by astronomers. 

Another example of machine learning using Galaxy Zoo 
classifications is provided in Banerji et al. ( 2010| l who trained 
a neural network on a subset of the GZl data , and (depending 
on the automatic measurements given to the algorithm) could 
reproduce the classifications to better than 90%. They con- 
cluded that Galaxy Zoo would provide an invaluable training 
set for future algorithms likely to be developed to classify the 
next generation of wide-field imaging surveys. 

5.3. Future citizen science projects 

With all the recent activities in the Zooniverse, it is impor- 
tant to consider the implications that citizen science has fo r 
future astronomical projects (e^. LOFAR (F alke et al.|2007| l, 
and the Dark Energy SurvejFMl. For example, we briefly 
consider here how volunteers might help with a proj ect like 
the Large Synoptic Survey Telescope (LSST) (LSST Science 
|Collaboration|2009 1, when it comes online this decade 



During the first Galaxy Zoo project, volunteers examined 
images of approximately one million galaxies. The storage 
space needed for all of these images was only a few Terabytes 
and therefore relatively easy to host and serve. In contrast, 
LSST will generate tens of Terabytes per day and over its ap- 
proximate ten year operational lifetime, it is estimated that it 
will generate tens of Petabytes. 

Among the citizen science projects that may contribute to 
LSST science are those that explore the time series data from 
the survey. Since LSST will do repeated imaging of the sky 
over the 10-year project duration, each of the roughly 50 bil- 
lion objects observed by LSST will have approximately 1000 
separate observations. These 50 trillion time series data points 
will provide an enormous opportunity to discover all types of 
rare phenomena, rare objects, rare classes, and new objects, 
classes, and sub-classes. 

No group of volunteers could hope to view all such data 
being generated from LSST. At the same time, the science 
team on the project will have no hope to keep up with such a 
data flow from the system. Obviously, automatic algorithms 
need to be used to triage the data and do basic classification 
of events. However, even with automatic classification, it is 
anticipated that tens of thousands (or more) anomalous events 
will be detected every day. Some of these might be astronom- 
ically significant (asteroids, supernovae, etc). However, many 
will not fall into any particular category, and in many cases, 
they might be some kind of noise (an airplane flying in the 
field of view). 



http : //www . darkenergysurvey . org 



The contributions of human participants may include: de- 
tection of unusual light curves in rotating asteroids; human- 
assisted search for best-fit models of these asteroids (includ- 
ing shapes, spin periods, and varying surface reflection prop- 
erties); discovery of unusual variations in known variable 
stars; discovery of interesting objects in the environments 
around variable objects; discovery of associations among 
multiple variable and/or moving objects in a field; and more. 
This is especially important for the nightly event stream - per- 
haps 100,000 new events will be detected each and every night 
for 10 years. There are not enough observing facilities or pro- 
fessional astronomers (or graduate students) in the world to 
follow up on each of these events. Engaging a large cohort 
of willing participants to examine these events will contribute 
significantly to the scientific discovery efficiency and effec- 
tiveness of the LSST survey: citizen scientists may explore 
this massive event stream for novel and interesting features, 
thereby characterizing the behavior of each such object. The 
creation of a "characterization database" of time-varying ob- 
jects (from which astronomers may query, search, and retrieve 
events based upon prescribed characteristic light curve behav- 
iors) may prove to be one of the most significant contributions 
of citizen scientists to the LSST project - i.e., the development 
of a major externally joined database component of the LSST 
science data collection. 

Something like this approach was effectively used with the 
detection of the "Peas" described above (Section 1.4.2.4) , i.e., 
once this class of objects was discovered, and determined to 
be interesting, a computer algorithm was developed to find 
them ( Cardamone et al.^2009) . By finding new classes of data, 
volunteers can make major contributions to science that would 
not be possible without their help. The Ga laxy Zoo Supernov a 
project also uses a similar methodology ( [Smith et al.||201 \\ . 
During an observing run, the science team receives tens of 
thousands of possible supernova candidates, and automatic 
algorithms are used to reduce this to a few hundred events 
a day that are likely supernovae. With the help of citizen sci- 
entists, these candidate supernovae are visually checked and 
thus confirmed for follow-up observations in real-time. 

In summary, as the data rates increase, and we become fur- 
ther dependent on automatic classification algorithms, citizen 
scientists can play a crucial role in reviewing subsets of the 
data and identifying anomalies. The algorithms can then be 
adapted by the science teams to increase their success rate 
(based on the visual checks). The two methodologies will 
need to work in tandem and the process will likely be itera- 
tive. 

6. GALAXY zoo IN THE CONTEXT OF OTHER CITIZEN SCIENCE 
PROJECTS 

Galaxy Zoo is certainly not the only citizen science project. 
As mentioned in the beginning of this chapter, one of the in- 
spirations for Galaxy Zoo was the Stardust@Home project 
( [Westphal et al.|[2006) . This was one of the few projects at 
the time where volunteers were asked to participate in the 
data analysis of a project rather than the data collection phase 
of a project. Many of the historically significant citizen sci- 
ence projects, such as the Audubon Society's Christmas Bird 
Count program (started in 1900) and the American Associ- 
ation of Variable Star Observers variable star observations 
project (starting in 1911), were based on data collection. With 
the advent of the internet as a distribution system, citizen sci- 
ence projects could move into work on data analysis. Here 
we make a distinction between distributed analysis projects 
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and distributed computation projects such as SETI@Homq^ 
which utilizes the computational power of over a million idle 
computers belonging to volunteers to process radio data look- 
ing for signals that could indicate extra-terrestrial intelligence. 
Distributed analysis projects require the brain - or the "wet- 
ware" - of the volunteer to be engaged, not just their computer. 
One of the earliest distributed data analysis projects (dating 
from 2001) was Clickworkers which asked the public to count 
the number of craters on maps of the Martian surface returned 
by the Mars Orbiter Camera (MOC). While the project was 
successful in recruitin g sufficient v olunteers to identify over 
800,000 MOC craters ( |Gulick et al. 2010 ), there are few sci- 
entific results published as of yet on this body of work. Click- 
workers has morphed into a project with a more game-like in- 
terfac^^where the public are currently asked to "tag" surface 
features on images from the Mars Rovers, Spirit and Oppor- 
tunity. Another project with a game-like interface is Foldltrj 
which asks the public to help solve protein folding "puzzles . 
Both the new Clickworkers and the Foldlt! projects require 
the user to download an application. In the context of dis- 
tributed data analysis projects. Galaxy Zoo (and its successor 
projects in the Zooniverse) is quite probably the largest both 
in terms of number of registered volunteers world-wide as 
well as number of peer-reviewed papers published based on 
data processed by volunteers. 

There are many excellent citizen science projects in ecol- 
ogy, animal studies and other disciplines where the distributed 
nature of data collection is critical to the success of the 
project. For example Cornell University's Lab of Ornithol- 
ogy Feeder Watch Projecpjasks volunteers to enter counts of 
bird species into an online form to track winter bird popula- 
tions, or the CoCoRaHS Network (Community Collaborative 
Rain, Hail and Snow Networkp]has thousands of volunteers 
across all fifty of the United States who have installed sen- 
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sors outside their homes and record amounts of rain, snow 
and hail. Thus, one potential trend in citizen science will then 
be projects that link the distributed data collection and dis- 
tributed data analysis aspects of their work. 

7. CONCLUSIONS 

In this chapter we have presented a brief overview of the 
Galaxy Zoo and Zooinverse projects. We gave a short dis- 
cussion of the history and motivation for the original Galaxy 
Zoo, as well as the motivations to extend it to "Galaxy Zoo 
2", "Hubble Zoo" and an entire "Zooiniverse" of citizen sci- 
ence projects. We have described the highlights of the many 
scientific results that have already come from Galaxy Zoo. 

We go on to discuss what makes a good citizen science 
project, and why we think Galaxy Zoo was so successful. We 
describe the importance of having a central portal, the Zooni- 
verse, as a gateway to citizen science projects across multi- 
ple disciplines. We then consider likely future applications 
of community-based science in the coming data-rich era. Fi- 
nally, to provide a context for the importance of the Galaxy 
Zoo project, we present a short description of various other 
citizen science projects and modalities. 

Galaxy Zoo and the many Zooniverse projects would 
not have been possible without the participation of 
now over 400,000 volunteers who have registered 
with the Zooinverse. The contributions of volun- 
teers to Galaxy Zoo are individually acknowledged at 
http : // www . galaxy zoo . org/Volunteers . aspx, 
and volunteers who classified in Galaxy Zoo 2 
(and wished to be acknowledged) are listed at 
http : // zoo2 . galaxyzoo . org/authors. The 
work described in this paper is funded in part by The Lever- 
hulme Trust (UK), the National Science Foundation and the 
National Aeronautics and Space Administration (US). 
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